Here's the basic set of imports and data reading functionality that we established in the Basic Time Series Plotting notebook.
In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, DayLocator
from siphon.simplewebservice.ndbc import NDBC
%matplotlib inline
In [ ]:
# Read in some data
df = NDBC.realtime_observations('42039')
# Trim to the last 7 days
df = df[df['time'] > (pd.Timestamp.utcnow() - pd.Timedelta(days=7))]
Often we wish to create figures with multiple panels of data. It's common to separate variables of different types into these panels. We also don't want to create each panel as an individual figure and combine them in a tool like Illustrator - imagine having to do that for hundreds of plots!
Previously we specified subplots individually with plt.subplot()
. We can instead use the subplots
method to specify a number of rows and columns of plots in our figure, which returns the figure and all of the axes (subplots) we ask for in a single call:
In [ ]:
# ShareX means that the axes will share range, ticking, etc. for the x axis
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True, figsize=(18, 6))
# Panel 1
ax1.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
ax1.set_xlabel('Time')
ax1.set_ylabel('Speed')
ax1.set_title('Measured Winds')
ax1.legend(loc='upper left')
ax1.grid(True)
# Not repeated only by sharing x
ax1.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax1.xaxis.set_major_locator(DayLocator())
# Panel 2
ax2.plot(df.time, df.pressure, color='black', label='Pressure')
ax2.set_xlabel('Time')
ax2.set_ylabel('hPa')
ax2.set_title('Atmospheric Pressure')
ax2.legend(loc='upper left')
ax2.grid(True)
plt.suptitle('Buoy 42039 Data', fontsize=24)
So even with the sharing of axis information, there's still a lot of repeated code. This current version with just two parameters might still be ok, but:
Iterating over lists is a very useful tool to reduce the amount of repeated code you write. We're going to start out by iterating over a single list with a for
loop. Unlike C or other common scientific languages, Python 'knows' how to iterate over certain objects without you needing to specify an index variable and do the book keeping on that.
In [ ]:
my_list = ['2001 A Space Obyssey',
'The Princess Bride',
'Monty Python and the Holy Grail']
for item in my_list:
print(item)
Using the zip
function we can even iterate over multiple lists at the same time with ease:
In [ ]:
my_other_list = ['I\'m sorry, Dave. I\'m afraid I can\'t do that.',
'My name is Inigo Montoya.',
'It\'s only a flesh wound.']
for item in zip(my_list, my_other_list):
print(item)
That's really handy, but needing to access each part of each item with an index like item[0]
isn't very flexible, requires us to remember the layout of the item, and isn't best practice. Instead we can use Python's unpacking syntax to make things nice and intuitive.
In [ ]:
for reference, quote in zip(my_list, my_other_list):
print(reference, '-', quote)
plot_variables
and plot_names
. Populate them
with the variable name and plot label string for windspeed and pressure.
In [ ]:
# Your code goes here
In [ ]:
# %load solutions/zip.py
zip
can also be used to "unzip" items.
In [ ]:
zipped_list = [(1, 2),
(3, 4),
(5, 6)]
unzipped = zip(*zipped_list)
print(list(unzipped))
Let's break down what happened there. Zip pairs elements from all of the input arguements and hands those back to us. So effectively out zip(*zipped_list)
is zip((1, 2), (3, 4), (5, 6))
, so the first element from each input is paired (1, 3, 5), etc. You can think of it like unzipping or transposing.
We can use the enumerate
function to 'count through' an iterable object as well. This can be useful when placing figures in certain rows/columns or when a counter is needed.
In [ ]:
for i, quote in enumerate(my_other_list):
print(i, ' - ', quote)
0 - 2001 A Space Obyssey - I'm sorry, Dave. I'm afraid I can't do that.
1 - The Princess Bride - My name is Inigo Montoya.
2 - Monty Python and the Holy Grail - It's only a flesh wound.
In [ ]:
# Your code goes here
In [ ]:
# %load solutions/enumerate.py
You're probably already familiar with Python functions, but here's a quick refresher. Functions are used to house blocks of code that we can run repeatedly. Paramters are given as inputs, and values are returned from the function to where it was called. In the world of programming you can think of functions like paragraphs, they encapsulate a complete idea/process.
Let's define a simple function that returns a value:
In [ ]:
def silly_add(a, b):
return a + b
We've re-implemented add which isn't incredibly exiciting, but that could be hundreds of lines of a numerical method, making a plot, or some other task. Using the function is simple:
In [ ]:
result = silly_add(3, 4)
print(result)
myfunc(4)
returns 2^4)
In [ ]:
# Your code goes here
In [ ]:
# %load solutions/functions.py
Let's create a function to read in buoy data and trim it down to the last 7 days by only providing the buoy number to the function.
In [ ]:
def read_buoy_data(buoy, days=7):
# Read in some data
df = NDBC.realtime_observations(buoy)
# Trim to the last 7 days
df = df[df['time'] > (pd.Timestamp.utcnow() - pd.Timedelta(days=days))]
return df
In [ ]:
df = read_buoy_data('42039')
df
Within a function call, we can also set optional arguments and keyword arguments (abbreviated args and kwargs in Python). Args are used to pass a variable length list of non-keyword arguments. This means that args don't have a specific keyword they are attached to, and are used in the order provided. Kwargs are arguments that are attached to specific keywords, and therefore have a specific use within a function.
In [ ]:
def arg_func(*argv):
for arg in argv:
print (arg)
arg_func('Welcome', 'to', 'the', 'Python', 'Workshop')
In [ ]:
# Create a function to conduct all basic math operations, using a kwarg
def silly_function(a, b, operation=None):
if operation == 'add':
return a + b
elif operation == 'subtract':
return a - b
elif operation == 'multiply':
return a * b
elif operation == 'division':
return a / b
else:
raise ValueError('Incorrect value for "operation" provided.')
In [ ]:
print(silly_function(3, 4, operation='add'))
print(silly_function(3, 4, operation='multiply'))
Kwargs are commonly used in MetPy, matplotlib, pandas, and many other Python libraries (in fact we've used them in almost every notebook so far!).
In [ ]:
# A list of names of variables we want to plot
plot_variables = ['wind_speed', 'pressure']
# Make our figure, now choosing number of subplots based on length of variable name list
fig, axes = plt.subplots(1, len(plot_variables), sharex=True, figsize=(18, 6))
# Loop over the list of subplots and names together
for ax, var_name in zip(axes, plot_variables):
ax.plot(df.time, df[var_name])
# Set label/title based on variable name--no longer hard-coded
ax.set_ylabel(var_name)
ax.set_title(f'Buoy {var_name}')
# Set up our formatting--note lack of repetition
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())
It's a step forward, but we've lost a lot of formatting information. The lines are both blue, the labels as less ideal, and the title just uses the variable name. We can use some of Python's features like dictionaries, functions, and string manipulation to help improve the versatility of the plotter.
To start out, let's get the line color functionality back by using a Python dictionary to hold that information. Dictionaries can hold any data type and allow you to access that value with a key (hence the name key-value pair). We'll use the variable name for the key and the value will be the color of line to plot.
In [ ]:
colors = {'wind_speed': 'tab:orange', 'wind_gust': 'tab:olive', 'pressure': 'black'}
To access the value, just access that element of the dictionary with the key.
In [ ]:
colors['pressure']
Now let's apply that to our plot. We'll use the same code from the previous example, but now look up the line color in the dictionary.
In [ ]:
fig, axes = plt.subplots(1, len(plot_variables), sharex=True, figsize=(18, 6))
for ax, var_name in zip(axes, plot_variables):
# Grab the color from our dictionary and pass it to plot()
color = colors[var_name]
ax.plot(df.time, df[var_name], color)
ax.set_ylabel(var_name)
ax.set_title(f'Buoy {var_name}')
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())
That's already much better. We need to be able to plot multiple variables on the wind speed/gust plot though. In this case, we'll allow a list of variables for each plot to be given and iterate over them. We'll store this in a list of lists. Each plot has its own list of variables!
In [ ]:
plot_variables = [['wind_speed', 'wind_gust'], ['pressure']]
fig, axes = plt.subplots(1, len(plot_variables), sharex=True, figsize=(18, 6))
for ax, var_names in zip(axes, plot_variables):
for var_name in var_names:
# Grab the color from our dictionary and pass it to plot()
color = colors[var_name]
ax.plot(df.time, df[var_name], color)
ax.set_ylabel(var_name)
ax.set_title(f'Buoy {var_name}')
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())
In [ ]:
# Create your linestyles dictionary and modify the code below
fig, axes = plt.subplots(1, len(plot_variables), sharex=True, figsize=(18, 6))
for ax, var_names in zip(axes, plot_variables):
for var_name in var_names:
# Grab the color from our dictionary and pass it to plot()
color = colors[var_name]
ax.plot(df.time, df[var_name], color)
ax.set_ylabel(var_name)
ax.set_title(f'Buoy {var_name}')
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())
In [ ]:
# %load solutions/looping1.py
We're almost back to where to started, but in a much more versatile form! We just need to make the labels and titles look nice. To do that, let's write a function that uses some string manipulation to clean up the variable names and give us an axis/plot title and legend label.
In [ ]:
def format_varname(varname):
parts = varname.split('_')
title = parts[0].title()
label = varname.replace('_', ' ').title()
return title, label
In [ ]:
fig, axes = plt.subplots(1, len(plot_variables), sharex=True, figsize=(18, 6))
linestyles = {'wind_speed': '-', 'wind_gust': '--', 'pressure': '-'}
for ax, var_names in zip(axes, plot_variables):
for var_name in var_names:
title, label = format_varname(var_name)
color = colors[var_name]
linestyle = linestyles[var_name]
ax.plot(df.time, df[var_name], color, linestyle=linestyle, label=label)
ax.set_ylabel(title)
ax.set_title(f'Buoy {title}')
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())
ax.legend(loc='upper left')
In [ ]:
buoys = ['42039', '42022']
fig, axes = plt.subplots(len(buoys), len(plot_variables), sharex=True, figsize=(14, 10))
for row, buoy in enumerate(buoys):
df = read_buoy_data(buoy)
for col, var_names in enumerate(plot_variables):
ax = axes[row,col]
for var_name in var_names:
title, label = format_varname(var_name)
color = colors[var_name]
linestyle = linestyles[var_name]
ax.plot(df.time, df[var_name], color, linestyle=linestyle, label=label)
ax.set_ylabel(title)
ax.set_title(f'Buoy {buoy} {title}')
ax.grid(True)
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
ax.xaxis.set_major_locator(DayLocator())